HBO

Ashley Wright & Mubeena Wahaj

2023-04-13

Lights, camera, action!

Today, we’re going to take a deep dive into the world of HBO movies and TV shows. From the iconic dramas like The Sopranos and Game of Thrones to the latest releases. HBO has been providing quality content to its viewers for decades, but have you ever wondered how they make decisions about what shows to produce or which movies to acquire? That’s where the fascinating world of HBO data comes into play. By analyzing audience trends, ratings, and viewer demographics, HBO can make informed decisions about what to offer to its loyal fans. So sit back, grab a snack, and get ready to explore the exciting world of HBO data.

Packages Used:

#load magick to process images
#load tidyverse to manipulate data
#load ggplot2 for graphing
#load shiny to...
#load dplyer to manipulate data
#load knitr for general-purpose literate programming
#load kableExtra to add features to table

library(magick)
library(tidyverse)
library(ggplot2)
library(shiny)
library(dplyr)
library(knitr)
library(kableExtra)

#remember to put how each package is used

About Our Data

The data we’ve decided to work on is from kaggle and is owned by Diego Enrique and here’s the link: https://www.kaggle.com/datasets/dgoenrique/hbo-max-movies-and-tv-shows

Titles data:

15 variables, 3030 observations

id: The title ID

title: The name of the title

show_type: Tv show or Movie

description: A description of movie or tv show

release_year: Year show/movie was released

age_certification: The age rating of movie or show

runtime: The length of the episode of show or movie

genres: A list of genres

production_countries: Countries that produced the show/movie

seasons: Number of seasons IF it is a show

imdb_id: The title ID on IMDB

imdb_score: Score on IMDB

imdb_votes: Votes on IMDB

tmdb_popularity: Popularity on TMDB

tmdb_score: Score on TMDB

Credits data:

5 variables, 64879 observations

person_ID: The person ID on JustWatch

id: The title ID on JustWatch

name: The name of actor or director

character_name: The name of character played in movie/show

role: ACTOR or DIRECTOR

Let us read our datas, shall we?

credits = read.csv("credits.csv", stringsAsFactors = FALSE)
titles = read.csv("titles.csv", stringsAsFactors = FALSE)
glimpse(credits)
## Rows: 64,879
## Columns: 5
## $ person_id <int> 14701, 14702, 14703, 14704, 14705, 14706, 1367, 14716, 14707…
## $ id        <chr> "tm77588", "tm77588", "tm77588", "tm77588", "tm77588", "tm77…
## $ name      <chr> "Humphrey Bogart", "Ingrid Bergman", "Paul Henreid", "Claude…
## $ character <chr> "Rick Blaine", "Ilsa Lund", "Victor Laszlo", "Captain Louis …
## $ role      <chr> "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR", "ACTOR…
glimpse(titles)
## Rows: 3,030
## Columns: 15
## $ id                   <chr> "tm77588", "tm155702", "tm83648", "tm3175", "ts22…
## $ title                <chr> "Casablanca", "The Wizard of Oz", "Citizen Kane",…
## $ type                 <chr> "MOVIE", "MOVIE", "MOVIE", "MOVIE", "SHOW", "MOVI…
## $ description          <chr> "In Casablanca, Morocco in December 1941, a cynic…
## $ release_year         <int> 1943, 1939, 1941, 1945, 1940, 1940, 1946, 1934, 1…
## $ age_certification    <chr> "PG", "G", "PG", "", "", "G", "", "", "", "PG-13"…
## $ runtime              <int> 102, 102, 119, 113, 8, 238, 114, 93, 111, 109, 12…
## $ genres               <chr> "['drama', 'romance', 'war']", "['fantasy', 'fami…
## $ production_countries <chr> "['US']", "['US']", "['US']", "['US']", "['US']",…
## $ seasons              <dbl> NA, NA, NA, NA, 16, NA, NA, NA, NA, NA, NA, NA, N…
## $ imdb_id              <chr> "tt0034583", "tt0032138", "tt0033467", "tt0037059…
## $ imdb_score           <dbl> 8.5, 8.1, 8.3, 7.5, 7.7, 8.2, 7.9, 7.9, 7.9, 8.3,…
## $ imdb_votes           <dbl> 577842, 406105, 446627, 25589, 859, 319463, 87289…
## $ tmdb_popularity      <dbl> 22.005, 56.631, 19.900, 8.311, 1.400, 27.535, 11.…
## $ tmdb_score           <dbl> 8.167, 7.583, 8.022, 7.000, 10.000, 8.000, 7.700,…

Whoops! let’s make it a little more readable

here’s our titles.csv

kable(head(credits),
      align = "c",
      caption = "Sample table of credits data",
      format = "html")
Sample table of credits data
person_id id name character role
14701 tm77588 Humphrey Bogart Rick Blaine ACTOR
14702 tm77588 Ingrid Bergman Ilsa Lund ACTOR
14703 tm77588 Paul Henreid Victor Laszlo ACTOR
14704 tm77588 Claude Rains Captain Louis Renault ACTOR
14705 tm77588 Conrad Veidt Major Heinrich Strasser ACTOR
14706 tm77588 Sydney Greenstreet Signor Ferrari ACTOR
#kable(head(titles),
 #     align = "c",
  #    caption = "Sample table of titles data",
   #   format = "html")

And here’s our titles.csv

titles <- within(titles, rm(description))
kable(head(titles),
      align = "c",
      caption = "Sample table of titles data",
      format = "html")
Sample table of titles data
id title type release_year age_certification runtime genres production_countries seasons imdb_id imdb_score imdb_votes tmdb_popularity tmdb_score
tm77588 Casablanca MOVIE 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167
tm155702 The Wizard of Oz MOVIE 1939 G 102 [‘fantasy’, ‘family’] [‘US’] NA tt0032138 8.1 406105 56.631 7.583
tm83648 Citizen Kane MOVIE 1941 PG 119 [‘drama’] [‘US’] NA tt0033467 8.3 446627 19.900 8.022
tm3175 Meet Me in St. Louis MOVIE 1945 113 [‘drama’, ‘family’, ‘romance’, ‘music’, ‘comedy’] [‘US’] NA tt0037059 7.5 25589 8.311 7.000
ts225761 Tom and Jerry SHOW 1940 8 [‘animation’, ‘comedy’, ‘family’, ‘action’] [‘US’] 16 tt6422744 7.7 859 1.400 10.000
tm156463 Gone with the Wind MOVIE 1940 G 238 [‘drama’, ‘romance’, ‘war’, ‘history’] [‘US’] NA tt0031381 8.2 319463 27.535 8.000

What if we try to combine these data sets?

both_data <- inner_join(titles, credits, by = "id")

kable(head(both_data),
      align = "c",
      caption = "Sample table of both data",
      format = "html")
Sample table of both data
id title type release_year age_certification runtime genres production_countries seasons imdb_id imdb_score imdb_votes tmdb_popularity tmdb_score person_id name character role
tm77588 Casablanca MOVIE 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167 14701 Humphrey Bogart Rick Blaine ACTOR
tm77588 Casablanca MOVIE 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167 14702 Ingrid Bergman Ilsa Lund ACTOR
tm77588 Casablanca MOVIE 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167 14703 Paul Henreid Victor Laszlo ACTOR
tm77588 Casablanca MOVIE 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167 14704 Claude Rains Captain Louis Renault ACTOR
tm77588 Casablanca MOVIE 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167 14705 Conrad Veidt Major Heinrich Strasser ACTOR
tm77588 Casablanca MOVIE 1943 PG 102 [‘drama’, ‘romance’, ‘war’] [‘US’] NA tt0034583 8.5 577842 22.005 8.167 14706 Sydney Greenstreet Signor Ferrari ACTOR

Firstly let’s see how many movies and TV shows we are dealing with

titles %>% 
  count(type)
##    type    n
## 1 MOVIE 2408
## 2  SHOW  622

Wow! that’s a lot more movies than shows! But let’s see it visually

# Create a data frame with counts of movies and shows
title_counts = data.frame(
  type = c("MOVIE", "SHOW"),
  Count = c(sum(titles$type == "MOVIE"), sum(titles$type == "SHOW"))
)

# Create the bar chart
ggplot(title_counts, aes(x = type, y = Count, fill = type)) +
  geom_bar(stat = "identity") +
  ggtitle("Number of Movies and Shows") +
  xlab("") +
  ylab("Count")

What is the range for our data?

range(titles$release_year)
## [1] 1901 2023

Out of curiosity, what are those titles?

oldest_title <- titles %>%
  filter(release_year == "1901")

oldest_title
##        id                   title  type release_year age_certification runtime
## 1 tm54582 The Prince of Magicians MOVIE         1901                         2
##       genres production_countries seasons imdb_id imdb_score imdb_votes
## 1 ['comedy']               ['FR']      NA                 NA         NA
##   tmdb_popularity tmdb_score
## 1           1.747          6
newest_title <- titles %>%
  filter(release_year == "2023")

newest_title
##           id                                                    title  type
## 1   ts226904                                           The Last of Us  SHOW
## 2   ts283518                                                    Velma  SHOW
## 3   ts375134                                                Rain Dogs  SHOW
## 4   ts374449                                                The Climb  SHOW
## 5  tm1301628                           Marc Maron: From Bleak to Dark MOVIE
## 6  tm1040094                                              House Party MOVIE
## 7  tm1310730                              Marlon Wayans: God Loves Me MOVIE
## 8   ts171230                                               Poor Devil  SHOW
## 9  tm1015760                                Chernobyl: The Lost Tapes MOVIE
## 10 tm1306271                         The Weeknd: Live at SoFi Stadium MOVIE
## 11 tm1306569                      Chasing Greatness: Coach K x LeBron MOVIE
## 12 tm1305288                       Marcella Arguello: Bitch, Grow Up! MOVIE
## 13 tm1303655                                 Super-Vilains: l'Enquête MOVIE
## 14 tm1296261 Just a Boy From Tupelo: Bringing Elvis to the Big Screen MOVIE
## 15 tm1065897                       Dionne Warwick: Don't Make Me Over MOVIE
## 16 tm1304306                                       The Family Meeting MOVIE
##    release_year age_certification runtime
## 1          2023             TV-MA      60
## 2          2023             TV-MA      25
## 3          2023                        27
## 4          2023                        44
## 5          2023                        65
## 6          2023                 R     100
## 7          2023                        60
## 8          2023             TV-MA      22
## 9          2023                        96
## 10         2023                 R      98
## 11         2023                        30
## 12         2023                 R      37
## 13         2023             PG-13      62
## 14         2023             PG-13      27
## 15         2023                PG      95
## 16         2023                        15
##                                                genres production_countries
## 1  ['drama', 'action', 'horror', 'scifi', 'thriller']               ['US']
## 2                    ['comedy', 'crime', 'animation']               ['US']
## 3                                 ['drama', 'comedy']         ['GB', 'US']
## 4                                         ['reality']               ['US']
## 5                         ['comedy', 'documentation']               ['US']
## 6                                          ['comedy']               ['US']
## 7                                          ['comedy']               ['US']
## 8                             ['comedy', 'animation']               ['ES']
## 9                        ['documentation', 'history']               ['GB']
## 10                                          ['music']               ['US']
## 11                                                 []               ['BR']
## 12                                         ['comedy']               ['US']
## 13                                  ['documentation']               ['FR']
## 14                                  ['documentation']                   []
## 15                         ['documentation', 'music']         ['US', 'GB']
## 16                                                 []                   []
##    seasons    imdb_id imdb_score imdb_votes tmdb_popularity tmdb_score
## 1        1 tt11915056        9.1     255529        3481.253      8.798
## 2        1 tt14153790        1.5      70034         130.974      3.399
## 3        1 tt19050000         NA         NA           0.600         NA
## 4        1 tt15082926        6.8        376          26.332         NA
## 5       NA tt26453369        7.1        787           6.638      5.000
## 6       NA  tt8005118        4.4       2360         103.564      6.500
## 7       NA tt26753138        6.3        204          15.338      6.700
## 8        1 tt15764846        6.6        265          13.511      7.714
## 9       NA tt13913326        7.9       1424              NA         NA
## 10      NA tt26685153        8.1        257          23.370      5.800
## 11      NA                    NA         NA           3.974         NA
## 12      NA tt26623699        6.9         27           7.509      2.000
## 13      NA tt26498712        5.5         45           3.402      6.000
## 14      NA                    NA         NA           2.605      4.500
## 15      NA  tt6170406        7.8        255           9.371         NA
## 16      NA                    NA         NA           3.091      2.000

Now let’s see what are the top 10 most popular movies and show from imbd and tmdb

top_10_movies <- titles %>% 
  filter(type == "MOVIE") %>%
  arrange(desc(imdb_score)) %>%
  select(title, type, release_year, genres, ) %>%
  head(10)
top_10_movies
##                                                title  type release_year
## 1                           The Shawshank Redemption MOVIE         1994
## 2                                    Celebrity Habla MOVIE         2009
## 3                                  Emergency Contact MOVIE         2015
## 4                                    The Dark Knight MOVIE         2008
## 5      The Lord of the Rings: The Return of the King MOVIE         2003
## 6                Euphoria: Trouble Don't Last Always MOVIE         2020
## 7        Juan Luis Guerra 4.40: Entre Mar y Palmeras MOVIE         2021
## 8  The Lord of the Rings: The Fellowship of the Ring MOVIE         2001
## 9              The Lord of the Rings: The Two Towers MOVIE         2002
## 10                                 Celebrity Habla 2 MOVIE         2010
##                                      genres
## 1                                 ['drama']
## 2                         ['documentation']
## 3                                ['comedy']
## 4  ['drama', 'thriller', 'action', 'crime']
## 5            ['fantasy', 'action', 'drama']
## 6                                 ['drama']
## 7                                 ['music']
## 8            ['fantasy', 'action', 'drama']
## 9            ['action', 'fantasy', 'drama']
## 10                        ['documentation']
top_10_shows <- titles %>% 
  filter(type == "SHOW") %>%
  arrange(desc(imdb_score)) %>%
  select(title, type, release_year, genres, ) %>%
  head(10)



top_10_movies
##                                                title  type release_year
## 1                           The Shawshank Redemption MOVIE         1994
## 2                                    Celebrity Habla MOVIE         2009
## 3                                  Emergency Contact MOVIE         2015
## 4                                    The Dark Knight MOVIE         2008
## 5      The Lord of the Rings: The Return of the King MOVIE         2003
## 6                Euphoria: Trouble Don't Last Always MOVIE         2020
## 7        Juan Luis Guerra 4.40: Entre Mar y Palmeras MOVIE         2021
## 8  The Lord of the Rings: The Fellowship of the Ring MOVIE         2001
## 9              The Lord of the Rings: The Two Towers MOVIE         2002
## 10                                 Celebrity Habla 2 MOVIE         2010
##                                      genres
## 1                                 ['drama']
## 2                         ['documentation']
## 3                                ['comedy']
## 4  ['drama', 'thriller', 'action', 'crime']
## 5            ['fantasy', 'action', 'drama']
## 6                                 ['drama']
## 7                                 ['music']
## 8            ['fantasy', 'action', 'drama']
## 9            ['action', 'fantasy', 'drama']
## 10                        ['documentation']
top_10_shows
##                          title type release_year
## 1             Band of Brothers SHOW         2001
## 2                    Chernobyl SHOW         2019
## 3                     The Wire SHOW         2002
## 4            Eyes on the Prize SHOW         1987
## 5                 The Sopranos SHOW         1999
## 6              Game of Thrones SHOW         2011
## 7               Rick and Morty SHOW         2013
## 8                    Homegrown SHOW         2021
## 9               The Last of Us SHOW         2023
## 10 Batman: The Animated Series SHOW         1992
##                                                          genres
## 1                         ['drama', 'war', 'history', 'action']
## 2                              ['drama', 'thriller', 'history']
## 3                                ['drama', 'crime', 'thriller']
## 4                                  ['documentation', 'history']
## 5                                            ['drama', 'crime']
## 6            ['scifi', 'drama', 'action', 'romance', 'fantasy']
## 7                    ['animation', 'scifi', 'action', 'comedy']
## 8                                    ['documentation', 'drama']
## 9            ['drama', 'action', 'horror', 'scifi', 'thriller']
## 10 ['family', 'scifi', 'animation', 'action', 'crime', 'drama']

Now lets look at the credits data.

credits %>%
  count(role)
##       role     n
## 1    ACTOR 62158
## 2 DIRECTOR  2721

Are any of these actors/directors in multiple projects? If so, who was in the most projects?

project_count <- credits %>%
  count(name)

glimpse(project_count)
## Rows: 45,276
## Columns: 2
## $ name <chr> " Amanda Phillips", "'Auntie' Mackay", "'Little Man' Machan", "'W…
## $ n    <int> 1, 1, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1,…
most_projects <- credits %>% 
  count(name) %>% 
  slice_max(n)

most_projects
##           name  n
## 1 Grey DeLisle 60
##Who is this person?

credits %>% 
  filter(name == "Grey DeLisle")
##    person_id        id         name
## 1      14142   ts21507 Grey DeLisle
## 2      14142   ts21601 Grey DeLisle
## 3      14142   ts20381 Grey DeLisle
## 4      14142   ts22480 Grey DeLisle
## 5      14142    ts5042 Grey DeLisle
## 6      14142   tm94209 Grey DeLisle
## 7      14142   tm93784 Grey DeLisle
## 8      14142   tm58727 Grey DeLisle
## 9      14142  tm656353 Grey DeLisle
## 10     14142   tm43061 Grey DeLisle
## 11     14142   tm23583 Grey DeLisle
## 12     14142  tm140798 Grey DeLisle
## 13     14142   ts20574 Grey DeLisle
## 14     14142  tm167763 Grey DeLisle
## 15     14142   tm62126 Grey DeLisle
## 16     14142   tm30342 Grey DeLisle
## 17     14142  tm167388 Grey DeLisle
## 18     14142   tm65596 Grey DeLisle
## 19     14142  tm160240 Grey DeLisle
## 20     14142  tm160231 Grey DeLisle
## 21     14142  tm151689 Grey DeLisle
## 22     14142  tm179476 Grey DeLisle
## 23     14142  tm177932 Grey DeLisle
## 24     14142    ts3139 Grey DeLisle
## 25     14142  tm159051 Grey DeLisle
## 26     14142  tm171655 Grey DeLisle
## 27     14142   tm63619 Grey DeLisle
## 28     14142  tm152510 Grey DeLisle
## 29     14142  tm152501 Grey DeLisle
## 30     14142   ts37956 Grey DeLisle
## 31     14142  tm195247 Grey DeLisle
## 32     14142  tm244555 Grey DeLisle
## 33     14142  tm238389 Grey DeLisle
## 34     14142  tm193859 Grey DeLisle
## 35     14142  tm244564 Grey DeLisle
## 36     14142  tm219341 Grey DeLisle
## 37     14142  tm214711 Grey DeLisle
## 38     14142  tm244119 Grey DeLisle
## 39     14142  tm244479 Grey DeLisle
## 40     14142  tm138025 Grey DeLisle
## 41     14142  tm365731 Grey DeLisle
## 42     14142  tm422417 Grey DeLisle
## 43     14142  tm301058 Grey DeLisle
## 44     14142  tm372838 Grey DeLisle
## 45     14142  tm361837 Grey DeLisle
## 46     14142  tm326858 Grey DeLisle
## 47     14142  tm414009 Grey DeLisle
## 48     14142  tm405461 Grey DeLisle
## 49     14142  tm317933 Grey DeLisle
## 50     14142  tm423754 Grey DeLisle
## 51     14142  tm820756 Grey DeLisle
## 52     14142   ts89867 Grey DeLisle
## 53     14142  tm894108 Grey DeLisle
## 54     14142  tm883958 Grey DeLisle
## 55     14142 tm1248448 Grey DeLisle
## 56     14142 tm1028015 Grey DeLisle
## 57     14142 tm1065433 Grey DeLisle
## 58     14142  tm930306 Grey DeLisle
## 59     14142 tm1171238 Grey DeLisle
## 60     14142  tm987899 Grey DeLisle
##                                                                           character
## 1                                                              Daphne Blake (voice)
## 2                                                        The High Priestess (voice)
## 3                                                              Daphne Blake (voice)
## 4                                                                     Mandy (voice)
## 5                                                  Frances 'Frankie' Foster (voice)
## 6                                                                    Daphne (voice)
## 7                                                              Daphne Blake (voice)
## 8                                             Daphne / Cat Witch / Honeybee (voice)
## 9  Frankie Foster / Tiny Friend / Little Boy Voice / Lady (voice) (as Grey DeLisle)
## 10                                        Crazy Old Cat Lady/Gramma Stuffum (voice)
## 11                                                                   Daphne (voice)
## 12                                                                   Daphne (voice)
## 13                                                                                 
## 14                                                           Barbara Gordon (voice)
## 15                                             Anchor Carla / Female Mutant (voice)
## 16                                                        Lois Lane / Queen (voice)
## 17                                       Ree'Yu / Ardakian Trawl / Boodikka (voice)
## 18                                                         Young Manchester (voice)
## 19                                                             Daphne Blake (voice)
## 20                                                             Daphne Blake (voice)
## 21                                            Grandmother (voice) (as Grey Griffin)
## 22                    Nora Allen / Young Barry Allen / Martha Wayne / Joker (voice)
## 23                                                             Anchor Carla (voice)
## 24                                                Margaret Sorrow / Magpie  (voice)
## 25                       Wonder Woman / Superbaby (voice) (as Grey DeLisle Griffin)
## 26                                                             Daphne Blake (voice)
## 27                                                             Daphne Blake (voice)
## 28                                                             Daphne Blake (voice)
## 29                                                             Daphne Blake (voice)
## 30                                                             Daphne Blake (voice)
## 31                                                          Tina / Platinum (voice)
## 32                                                             Daphne Blake (voice)
## 33                                                             Wonder Woman (voice)
## 34                                                                 Samantha (voice)
## 35                                                             Wonder Woman (voice)
## 36                                                 Wonder Woman / Lois Lane (voice)
## 37                                                             Daphne Blake (voice)
## 38                                                             Daphne Blake (voice)
## 39                                                             Wonder Woman (voice)
## 40                                                             Daphne Blake (voice)
## 41                                Sister Leslie / Jason / Additional Voices (voice)
## 42                                                             Daphne Blake (voice)
## 43                                                             Daphne Blake (voice)
## 44                        Wonder Woman / Diana Prince (voice) and Lois Lane (voice)
## 45                                              Daphne Blake / Black Canary (voice)
## 46                                                             Daphne Blake (voice)
## 47           Diana Prince / Wonder Woman (voice) / Lois Lane (voice) / Ring (voice)
## 48                                                         Wonder Woman / Lois Lane
## 49                                                  Wonder Woman / Platinum (voice)
## 50                                                             Wonder Woman (voice)
## 51                                                               Mrs. Claus (Voice)
## 52                                                             Daphne Blake (voice)
## 53                                                        Additional Voices (voice)
## 54                                         Wonder Woman (voice) / Lois Lane (voice)
## 55                                     Daphne / Daisy / Musketeer 1 / Olive (voice)
## 56                                   Beelzebub / Little Della / Little Jack (voice)
## 57                                         Daphne Blake / Frau Glockenspiel (voice)
## 58                                                                 Lady Eve (voice)
## 59                                              Diana Prince / Wonder Woman (voice)
## 60                                                             Daphne Blake (voice)
##     role
## 1  ACTOR
## 2  ACTOR
## 3  ACTOR
## 4  ACTOR
## 5  ACTOR
## 6  ACTOR
## 7  ACTOR
## 8  ACTOR
## 9  ACTOR
## 10 ACTOR
## 11 ACTOR
## 12 ACTOR
## 13 ACTOR
## 14 ACTOR
## 15 ACTOR
## 16 ACTOR
## 17 ACTOR
## 18 ACTOR
## 19 ACTOR
## 20 ACTOR
## 21 ACTOR
## 22 ACTOR
## 23 ACTOR
## 24 ACTOR
## 25 ACTOR
## 26 ACTOR
## 27 ACTOR
## 28 ACTOR
## 29 ACTOR
## 30 ACTOR
## 31 ACTOR
## 32 ACTOR
## 33 ACTOR
## 34 ACTOR
## 35 ACTOR
## 36 ACTOR
## 37 ACTOR
## 38 ACTOR
## 39 ACTOR
## 40 ACTOR
## 41 ACTOR
## 42 ACTOR
## 43 ACTOR
## 44 ACTOR
## 45 ACTOR
## 46 ACTOR
## 47 ACTOR
## 48 ACTOR
## 49 ACTOR
## 50 ACTOR
## 51 ACTOR
## 52 ACTOR
## 53 ACTOR
## 54 ACTOR
## 55 ACTOR
## 56 ACTOR
## 57 ACTOR
## 58 ACTOR
## 59 ACTOR
## 60 ACTOR

And what’s the distribution of genres do we have from both?

#MOVIE_genre_data= titles %>% 
#  separate_rows(genres, sep = ", ") %>% 
#  group_by(type = "MOVIE",genres) %>% 
#  summarize(Count = n()) %>% 
#  ungroup()

genre_counts <- titles %>%
  mutate(genres = str_remove_all(genres, "'")) %>% 
  mutate(genres = gsub("\\[", "", genres)) %>% 
  mutate(genres = gsub("\\]", "", genres)) %>% 
  separate_rows(genres, sep = ", ") %>%
  group_by(genres, type) %>%
  summarize(Count = n()) %>%
  ungroup() %>%
  arrange(desc(Count))

# Create the bar chart
ggplot(genre_counts, aes(x = reorder(genres, Count), y = Count, fill = type)) +
  geom_bar(stat = "identity")  +
  labs(x = "Genre", y = "Count", title = "Distribution of Genres") +
  theme_minimal()

Looks like the type of genres are hard to read. Let’s flip our coordinates

# genre_counts <- titles %>%
#   separate_rows(genres, sep = ", ") %>%
#   group_by(genres) %>%
#   summarize(Count = n()) %>%
#   ungroup() %>%   #ungroup() function is used to remove the grouping structure from the data frame after performing the group by operation.In this case, after calculating the genre counts within each group using summarize(), we want to work with the data as a whole, not just within each genre group. So, ungroup() is used to remove the grouping structure and return the data to its original form,
#   arrange(desc(Count))

# Create the bar chart
#ggplot(genre_counts, aes(x = reorder(genres, Count), y = Count)) +
#  geom_bar(stat = "identity", fill = "purple")  +
#  coord_flip() +
#  labs(x = "Genre", y = "Count", title = "Distribution of Genres", ) +
#  theme_minimal()

genre_counts <- titles %>%
  mutate(genres = str_remove_all(genres, "'")) %>% 
  mutate(genres = gsub("\\[", "", genres)) %>% 
  mutate(genres = gsub("\\]", "", genres)) %>% 
  separate_rows(genres, sep = ", ") %>%
  group_by(genres, type) %>%
  summarize(Count = n()) %>%
  ungroup() %>%
  arrange(desc(Count))

# Create the bar chart
ggplot(genre_counts, aes(x = reorder(genres, Count), y = Count, fill = type)) +
  geom_bar(stat = "identity")  +
  labs(x = "Genre", y = "Count", title = "Distribution of Genres") +
  theme_minimal()+coord_flip()

Here are the number of shows available in Netflix as a function of time¶

# titles$release_year <- as.Date(paste0(titles$release_year, "-01-01"))  
# convert release_year to date format

# titles$release_date <- as.Date(paste0("01-01-", titles$release_year), format = "%d-%m-%Y")

# create type column

#titles$type <- ifelse(titles$type == "SHOW", "MOVIE", "no")

# count number of titles by year and type
title_counts <- titles %>%
  group_by(release_year, type) %>%
  summarize(count = n())

# plot number of titles by year and type
ggplot(titles , aes(x = release_year, fill = type)) +
  geom_bar() +
  labs(x = "Release Year", y = "Number of Titles", title = "Number of Shows and Movies Available by Year") +
  scale_fill_manual(values = c("SHOW" = "purple", "MOVIE" = "darkgrey")) +
  theme(plot.title = element_text(hjust = 0.5)) 

Let’s find out

genre_popularity <- titles %>%
  mutate(genres = str_remove_all(genres, "'")) %>% 
  mutate(genres = gsub("\\[", "", genres)) %>% 
  mutate(genres = gsub("\\]", "", genres)) %>% 
  separate_rows(genres, sep = ", ") %>%
  group_by(genres, type,tmdb_popularity,tmdb_score ) %>%
  summarize(Count = n()) %>%
  ungroup() %>%
  arrange(desc(tmdb_popularity))

genre_popularity
## # A tibble: 7,559 × 5
##    genres   type  tmdb_popularity tmdb_score Count
##    <chr>    <chr>           <dbl>      <dbl> <int>
##  1 action   SHOW            3481.       8.80     1
##  2 drama    SHOW            3481.       8.80     1
##  3 horror   SHOW            3481.       8.80     1
##  4 scifi    SHOW            3481.       8.80     1
##  5 thriller SHOW            3481.       8.80     1
##  6 action   MOVIE            696.       7.13     1
##  7 fantasy  MOVIE            696.       7.13     1
##  8 scifi    MOVIE            696.       7.13     1
##  9 action   SHOW             559.       8.4      1
## 10 drama    SHOW             559.       8.4      1
## # … with 7,549 more rows
# Create the bar chart
ggplot(genre_popularity, aes(x = reorder(genres, Count), y = tmdb_popularity, fill = type)) +
  geom_bar(stat = "identity")  +
  labs(x = "Genre", y = "tmdb_popularity", title = "Genres and its popularity") +
  theme_light()+coord_flip()

Who would’ve know?!